Chinese Terminology Extraction Using Window-Based Contextual Information
نویسندگان
چکیده
Terminology extraction is an important work for automatic update of domain specific knowledge. Contextual information helps to decide whether the extracted new terms are terminology or not. As extraction based on fixed patterns has very limited use to handle natural language text, we need both syntactical and semantic information in the context of a term to determine its termhood. In this paper, we investigate two window-based context word extraction methods taking into account of syntactic and semantic information. Based on the performance of each method individually, a hybrid method which combines both syntactical and semantic information is proposed. Experiments show that the hybrid method can achieve significant improvement.
منابع مشابه
NLP Techniques for Term Extraction and Ontology Population
This chapter investigates NLP techniques for ontology population, using a combination of rule-based approaches and machine learning. We describe a method for term recognition using linguistic and statistical techniques, making use of contextual information to bootstrap learning. We then investigate how term recognition techniques can be useful for the wider task of information extraction, makin...
متن کاملA Comparative Study of the Effect of Word Segmentation On Chinese Terminology Extraction
Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to C...
متن کاملتصدیق امضای پویا و احراز هویت مبتنی بر استخراج نقاط غالب پایدار و تقطیع الگوهای امضا
One of the basic problems in signature verification is variability and differences apparent on patterns of signature even for an individual. Signature segmentation to basic components, in addition to the access to the stable features, the hidden differences are revealed between genuine and forgery patterns. In this paper, signature patterns of two-dimensional are segmented by using dominant poi...
متن کاملTwo-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures
Word extraction is one of the important tasks in text information processing. There are mainly two kinds of statisticbased measures for word extraction: the internal measure and the contextual measure. This paper discusses these two kinds of measures for Chinese word extraction. First, nine widely adopted internal measures are tested and compared on individual basis. Then various schemes of com...
متن کاملIdentifying Contextual Information for Multi-Word Term Extraction
Methods for multi-word term extraction have traditionally involved statistical techniques. More recently, hybrid techniques have been evolving which incorporate some linguistic knowledge. This information is generally very shallow, and researchers have tended to ignore any real understanding of either terms or the context in which they appear. We adopt an approach which uses a variety of knowle...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007